MPI on a Million Processors
نویسندگان
چکیده
Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.
منابع مشابه
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
The size of supercomputers in numbers of processors is growing exponentially. Today’s largest supercomputers have upwards of a hundred thousand processors and tomorrow’s may have on the order one million. The applications that run on these systems commonly coordinate their parallel activities via MPI; a trace of these MPI communication events is an important input for tools that visualize, simu...
متن کاملReducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on interprocess communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one dimensional distributed algorithm for parallel spa...
متن کاملStudy of parallel programming models on computer clusters with Intel MIC coprocessors
Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many highperformance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the...
متن کاملrMPI: Message Passing on Multicore Processors with On-Chip Interconnect
With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the onchip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the fir...
متن کاملMPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling
Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009